European Psychiatry
● Royal College of Psychiatrists
Preprints posted in the last 90 days, ranked by how well they match European Psychiatry's content profile, based on 10 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Lagerberg, T.; Yukhnenko, D.; Vazquez-Montes, M.; Fanshawe, T. R.; Fazel, S.
Show abstract
BackgroundExternal validations of existing risk models is an efficient step towards potential implementation, obviating the need to develop new models. However, validation in new clinical settings poses several challenges. ObjectiveTo externally validate the OxSATS tool using data from the Oxford Monitoring System for Self-harm in England. OxSATS is a validated tool to predict suicide after self-harm developed using Swedish population registers. MethodsWe selected episodes of self-harm (ICD-10 codes X60-84; Y10-34) by individuals aged 10-64 years who presented to a large regional hospital between 1 January 2000 and 31 December 2018, and were followed up until 31 December 2019. We applied the OxSATS tool to estimate each individuals suicide risk within 12 months after their index self-harm. We assessed model performance using discrimination (Harrells c-index) and calibration measures (calibration plot and the observed-to-expected events ratio, O:E). We assessed the effects of missing predictors on calibration and subsequently recalibrated the model. FindingsWe identified 16,120 individuals who presented to hospital with self-harm, of whom 101 (0.6%) died by suicide in the 12-month follow-up period. The OxSATS model showed good discrimination in external validation (c-index=0.72, 95% CI=0.67, 0.77). Recalibration was required because initial calibration reflected a lower outcome rate in the new data. After recalibration, calibration performance was excellent (O:E=1.00, 95% CI=0.80, 1.20). ConclusionsDespite differences in clinical services and outcome ascertainment, suicide risk models can maintain good predictive performance in new settings. However, recalibration should be considered when applying prediction models in new settings, and the impact of missing predictors should be assessed using sensitivity analyses. KEY MESSAGESO_ST_ABSWhat is already known on this topicC_ST_ABSSuicide risk is substantially elevated after hospital presentation for self-harm, but most existing risk assessment tools rely on rating scales or binary cut-offs, show limited predictive accuracy, and rarely report calibration. OxSATS is a prognostic model developed using Swedish register data that provides continuous risk estimates and demonstrated good discrimination and calibration in its original setting. External validation in new healthcare systems is essential before implementation, but is often complicated by differences in predictor definitions, missing variables, and outcome prevalence. What this study addsThis study provides the first external validation of OxSATS in an English clinical setting using routinely collected hospital data. The model retained good discrimination but initially overpredicted suicide risk due to a lower baseline event rate and one missing predictor, highlighting the importance of calibration assessment. How this study might affect research, practice or policyFuture research and implementation strategies should routinely incorporate external validation, sensitivity analyses for missing predictors, and local recalibration before clinical or policy adoption.
Kanso, N.; Skelton, M.; Rimes, K. A.; Wong, G.; Eley, T. C.; Carr, E.
Show abstract
BackgroundDepression and anxiety are common mental health conditions in the UK. NHS Talking Therapies offers evidence-based therapies and is the largest provider of treatment, yet, only 50% of patients recover. Accurate outcome prediction could identify those at risk of poor outcomes and support more personalised care. This study aimed to develop and internally validate multivariable prediction models using routinely collected data from a large, ethnically diverse sample to enable fair, data-driven treatment decisions. MethodsData included 30,999 adults who completed high-intensity therapy at a single NHS trust between 2018 and mid-2024. Seven NHS post-treatment outcomes were modelled: reliable improvement, recovery, and reliable recovery for both depression and anxiety, and also functional impairment at the end of treatment. Predictors measured at baseline included sociodemographic and clinical characteristics. Models were developed using elastic net logistic regression and internally validated using bootstrap resampling. ResultsThe sample was predominantly female (73%) with a median age of 34; 57% identified as White and 22% as Black. Models showed moderate to good discrimination (AUC 0.63-0.77) and strong calibration. Key predictors aligned with clinical expectations, including baseline symptom severity, unemployment, benefit receipt, reporting a disability or long-term condition, psychotropic medication use among other sociodemographic factors. ConclusionsThis study highlights the potential of data-driven tools to inform clinical decisions and treatment stratification in NHS Talking Therapies. Early identification of patients less likely to benefit from standard care could support timely review, monitoring, or tailored interventions. External validation and implementation research are needed to ensure generalisability and equity in care.
Sivak, L.; Forsman, J.; Sariaslan, A.; Tiihonen, J.; Fazel, S.
Show abstract
BackgroundForensic psychiatric services are expanding in many countries, and discharging patients from secure hospitals relies on accurate estimates of risk of adverse outcomes. Novel evidence-based tools for estimating one key risk, violent reoffending, have been developed in recent years. We aimed to externally validate one new tool, FoVOx, in forensic psychiatric patients sentenced to treatment, and to develop an updated model (FoVOx2), incorporating additional clinical predictors. MethodsUsing Swedish national registers, we conducted a temporal external validation of FoVOx by examining 767 patients discharged between 2014 and 2023. For the FoVOx2 cohort, 906 patients discharged between 2008 and 2023 were followed up, and additional predictors tested. The outcome was violent reconviction within 12 or 24 months. Model performance was evaluated using Harrells C-index, time-dependent AUCs, calibration, and classification metrics at predefined thresholds. ResultsIn temporal validation, FoVOx showed moderate discrimination (AUCs 0.69 and 0.71; C-index = 0.69) and acceptable overall accuracy (Brier <0.11). Calibration was generally good, with mild overestimation at the highest predicted risks (>20%) at 12 months and slight underprediction at 24 months. The updated FoVOx2 model newly incorporated clozapine treatment and additional diagnostic categories. It was associated with improved performance (AUCs 0.77; optimism-corrected C-index = 0.72; Brier 0.06 and 0.09) and achieved good calibration (intercept {approx} 0; slopes 1.03 and 1.05). ConclusionsUpdating risk assessment tools with additional clinical factors can lead to incremental improvement in model performance. Implementing tools should consider clinical utility and impact as next steps.
Hayes, D.; Wright, J.; Burton, A.; Bu, F.; Sticpewich, L.; Stuttard, H.; Page, J.; Bradbury, A.; Han, E.; Deighton, J.; Tibber, M. S.; Talwar, S.; Fancourt, D.
Show abstract
BackgroundProlonged waiting times for Child and Adolescent Mental Health Services (CAMHS) leave many young people without structured support while awaiting specialist treatment. Social prescribing has been proposed as a community-based adjunct within CAMHS pathways; however, evidence regarding its safety and clinical impact remains limited. MethodsWellbeing While Waiting was a multi-site non-randomised controlled trial embedded within a hybrid type II implementation-effectiveness evaluation conducted across 11 CAMHS in England. The protocol was prospectively published prior to recruitment (BMC Psychiatry; 10.1186/s12888-023-04758-0). Between May 2023 and March 2025, 558 young people aged 11-18 years referred to CAMHS were enrolled (225 usual care; 333 social prescribing). Primary outcomes were anxiety and depression symptoms, total emotional and behavioural difficulties, and perceived stress. Secondary outcomes included resilience and wellbeing. ResultsNo intervention-related adverse events were observed. On average, participants had 5 sessions with a Link Worker. Compared with usual care, no significant differences were observed in anxiety or depression symptoms. However, participants receiving social prescribing demonstrated significant improvements in total emotional and behavioural difficulties over six months, driven by reductions in conduct difficulties, hyperactivity and peer problems. Significant improvements for those receiving social prescribing were also found for prosocial behaviour and resilience. ConclusionsWithin routine CAMHS pathways, no intervention-related adverse events were observed for social prescribing, and social prescribing was associated with improvements in behavioural and resilience-related outcomes, although not in anxiety or depressive symptoms. Findings suggest social prescribing may offer a valuable adjunct during delayed access to specialist treatment, with effects distinct from symptom-focused clinical therapies.
Johnson, L. F.; Giovenco, D.; Eyal, K.; Craig, A.; Petersen, I.; Tlali, M.; Levitt, N. S.; Bachmann, M.; Haas, A. D.; Fairall, L.
Show abstract
BackgroundDepression is estimated to be the leading cause of disability in South Africa, yet data on depression prevalence and antidepressant use are inconsistent and fragmentary. We present a system dynamics modelling approach to integrate these data and assess trends and inequalities in depression prevalence and treatment access. MethodsWe developed a deterministic model of the South African population aged 15 and older, stratified by age, sex, HIV status/stage and susceptibility to depression. Individual transitions between depressed/healthy and treated/untreated states were simulated over time, from 1985. The model was calibrated to depression prevalence data from nine nationally representative household surveys (2002-2024) and ten smaller studies reporting prevalence of antidepressant use, using a Bayesian approach. ResultsThe model estimated a slight decline in depression point prevalence over time, from 5.1% (95% CI: 4.5-5.6%) in 2002 to 4.5% (95% CI: 4.0-5.0%) in 2024, although with a transient rise in depression prevalence during the COVID-19 period. In 2024, depression prevalence was higher in women (5.3%, 95% CI: 4.7-5.9%) than men (3.6%, 95% CI: 3.2-4.0%), and highest at ages 60 and older. The lifetime prevalence of depression was 70.6% (95% CI: 67.8-73.6); alternative model settings with a more concentrated distribution of depression risk were inconsistent with longitudinal data. The proportion of adults using antidepressants increased from 1.0% (95% CI: 0.8-1.2%) in 2008 to 2.8% (95% CI: 2.2-3.4%) in 2024. In 2024, antidepressant use was 11.0% (95% CI: 8.8-13.5%) in the private sector, compared to only 0.9% (95% CI: 0.7-1.1%) in the rest of the population, and the ratio of new antidepressant initiations to new cases of depression was 0.12 nationally. ConclusionThe prevalence of depression in South Africa has been relatively stable over the last two decades. Although antidepressant use has increased, overall use remains low, and substantial inequality remains in access to treatment in the public and private health sectors.
Pruin, E.; Milaneschi, Y.; Bartels, M.; Bassani, P.; Penninx, B. W.; Peyrot, W. J.
Show abstract
BackgroundGenetic liability of depressive disorder can be captured by psychopathology in relatives (family history). Various methods summarize family history in a single score, differing in included information as well as underlying model. We systematically compared the performance of family history indicators, including promising new indicators based on the liability threshold model, in predicting depressive disorder. MethodsWe calculated selected family history indicators for depression (dichotomous, proportion, novel genetically-informed method PAFGRS) in 1339 participants of the Netherlands Study of Depression and Anxiety (Ncase= 1086). Polygenic scores were computed from the most recent GWAS for major depression. We assessed correlations between genetic liability indicators, as well as their prediction of lifetime depressive disorder diagnosis. ResultsCorrelations of family history indicators with each other were high (r = 0.71 - 0.99), and much lower with the PGS (r = 0.15). There was a suggested increase in predictive accuracy for more elaborately computed scores, ranging from proportion (AUC = 0.66, OR = 2.26, 95%CI = 1.88-2.71) to PAFGRS (AUC = 0.70, OR =17.06, 95%CI = 9.46 - 30.77). The best-performing family history indicator and the PGS were independently associated with depressive disorder (PAFGRS: OR = 15.17, 95%CI = 8.36-27.51, p = 3.59x10-19; PGS: OR = 1.30, 95%CI = 1.12-1.50, p = 0.0004). ConclusionsOur analysis shows that more elaborate family history indicators, including family size, prevalence, heritability and based on genetic theory, would be preferrable over simpler methods. Family history and PGS were complementary in prediction, showing the added value of including both in future studies.
Mattelin, E.; Weyler, H.; Andersson, R.; Paulsen, J.; Tielman, S.; Vikgren, A.; Bondjers, K.; Serlachius, E.; Mataix-Cols, D.; Bragesjo, M.
Show abstract
ObjectivesTrauma-focused cognitive behavioural therapy (TF-CBT) is the established first-line treatment for paediatric posttraumatic stress disorder (PTSD), but access to evidence-based care remains limited. This study aimed to evaluate the feasibility and acceptability of a therapist-guided, 12-week, internet-delivered TF-CBT (iTF-CBT) programme for adolescents with PTSD, and to explore preliminary changes in PTSD symptoms. DesignSingle-group feasibility trial. SettingSave the Children, Sweden. ParticipantsTwenty-two adolescents (13-17 years, 82% female) with primary PTSD. InterventionsA 12-week, therapist-guided, internet-delivered TF-CBT comprising eight modules and parallel caregiver modules with joint child-caregiver activities. OutcomesFeasibility measures included recruitment pace, participant retention, treatment adherence (module completion), and therapist time. Acceptability was evaluated through satisfaction, credibility, negative effects, and reported adverse events. Preliminary treatment effects were evaluated as within-group changes in PTSD severity using independent evaluator-rated Clinician-Administered PTSD Scale (CAPS-CA-5) and the self-reported Child and Adolescent Trauma Screen 2 (CATS-2). Assessments occurred at baseline, during treatment, post-treatment, and at 1-month follow-up (primary endpoint). ResultsRecruitment was completed after seven months of active enrolment. Retention and adherence were high, satisfaction and credibility ratings were favourable, and no intervention-related serious adverse events occurred. Clinically meaningful within-group improvements were observed at the primary endpoint, with large reductions on CAPS-CA-5 (Cohens d = 1.27) and CATS-2 (Cohens d = 1.51). ConclusionsTherapist-guided iTF-CBT for adolescents with PTSD was safe, feasible, acceptable, and associated with clinically meaningful symptom improvements. These findings support further evaluation in larger, controlled trials to determine efficacy, cost-effectiveness, and long-term outcomes. Trial registrationClinicalTrials.gov NCT06185244. Article SummaryO_ST_ABSStrengths and limitations of this studyC_ST_ABSO_LIFirst internet-delivered TF-CBT trial for young people with PTSD C_LIO_LIUse of clinician-rated PTSD symptoms (CAPS-CA-5) in combination with validated self-report measures. C_LIO_LIThe intervention was developed in close collaboration with clinicians, alongside contributions from young people. C_LIO_LIAbsence of a control group. C_LI
Graupensperger, S.; Brown, M.; Chekroud, A.; Mabe, B.; Kopecky, O.; Srokosz, N.; Hopkins, J.; Hawrilenko, M.
Show abstract
ImportanceAI-enabled features may improve the effectiveness of routine mental health care, yet large-scale real-world evidence remains limited. ObjectiveTo evaluate whether access to AI-enabled continuous care features embedded within routine psychotherapy delivery is associated with improved treatment engagement and clinical outcomes under real-world conditions. DesignPreregistered cluster-level, matched, quasi-experimental study using a real-world rollout of AI-enabled continuous care features compared with psychotherapy alone (intention-to-treat framework). SettingAn employer-sponsored behavioral health program providing outpatient psychotherapy for employees and dependents. ParticipantsAdults initiating a new episode of psychotherapy from 25 employers with access to continuous care features and 75 matched employers without access. Treatment engagement was assessed over 7 weeks (n=26,208), and clinical outcomes were evaluated for up to 180 days (n=5,518). ExposureEmployer-level access to AI-enabled continuous care features supporting engagement and continuity before and between psychotherapy sessions, compared with psychotherapy alone. Main OutcomesEarly treatment engagement (number of psychotherapy sessions attended and time to second session) and changes in depressive and anxiety symptom severity measured using the Patient Health Questionnaire-9 (PHQ-9) and Generalized Anxiety Disorder-7 (GAD-7). ResultsCompared with matched controls receiving psychotherapy alone, the intervention group attended 5% more psychotherapy sessions during the first 7 weeks (rate ratio, 1.05 [1.01, 1.10]) and completed their second session sooner (mean difference, -0.62 days [-1.05, -0.18]). Both groups demonstrated substantial symptom improvement over time; however, access to continuous care features was associated with additional improvement in depressive symptoms (d=0.16) and anxiety symptoms (d=0.15) at the median duration of care (day 44). These effects translated into clinically meaningful differences in reliable improvement by the median duration of care (NNT=25 for both outcomes). Conclusions and RelevanceIn this real-world evaluation, access to AI-enabled continuous care features embedded within routine psychotherapy delivery was associated with greater early engagement and a higher likelihood of reliable symptom improvement beyond psychotherapy alone. These findings suggest that augmenting routine psychotherapy with AI-enabled continuous care can meaningfully shift recovery trajectories during a standard treatment episode, strengthening early treatment momentum and improving outcomes at scale. Key PointsO_ST_ABSQuestionC_ST_ABSIs access to AI-enabled continuous care features embedded within routine psychotherapy delivery associated with improved treatment engagement and clinical outcomes under real-world conditions? FindingsIn this cluster-level, matched, quasi-experimental study of adults receiving psychotherapy within an employer-sponsored behavioral health program, access to AI-enabled continuous care features was associated with significantly greater early treatment engagement and faster improvement in depressive and anxiety symptoms compared with psychotherapy alone. MeaningAI-enabled support features may incrementally enhance the delivery and effectiveness of established psychotherapies when implemented as complements to routine care at scale.
Cudic, M.; Meyerson, W. U.; Wang, B.; Yin, Q.; Khadse, P. N.; Burke, T.; Kennedy, C. J.; Smoller, J. W.
Show abstract
BackgroundLongitudinal measurement of depression severity in outpatient psychiatric care is limited by infrequent standardized assessments. Although psychiatric clinical notes capture illness burden and functional impairment, this information is rarely quantified for analysis. ObjectiveTo evaluate whether large language models (LLMs) can infer clinically meaningful measures of depression severity from outpatient psychiatry notes. MethodsWe sampled 91,651 outpatient psychiatry notes from 8,287 adult patients across 58 clinics within a large academic medical center between 2015 and 2021. A HIPAA-compliant LLM (OpenAI GPT-5.2) was prompted to independently estimate three depression severity scores (Patient Health Questionnaire-9 [PHQ-9], Hamilton Depression Rating Scale [HAM-D], and depression-specific Clinical Global Impression-Severity [CGI-S]) from notes, with patient-reported PHQ-9 content within notes redacted to prevent biasing. Convergent validity was assessed against patient-reported PHQ-9 (n=3,757), study-clinician chart review (n=125), and treating-clinician suicide risk assessments (SRA; n=2,985). Predictive validity was evaluated using survival models of antidepressant switching and psychiatric emergency visits. Discriminant validity across diagnoses and consistency across demographic groups and clinics were also evaluated. Results10.8% of eligible visits had a PHQ-9 recorded within 7 days before the encounter. LLM-inferred PHQ-9 scores showed moderate agreement with patient-reported PHQ-9 (Cohens {kappa}=0.64, 95%CI:0.62-0.66; Pearson r=0.67, 95%CI: 0.65-0.68). Stronger agreement was found between LLM CGI-S and study-clinician chart review ({kappa}rater1=0.79, 95%CI: 0.70-0.85; {kappa}rater2=0.67, 95%CI: 0.58-0.77; r=0.86 with mean rating, 95%CI: 0.80-0.90). In prospective analyses, LLM CGI-S predicted antidepressant switching (C-index=0.60; CI95%: 0.58-0.62) and psychiatric emergency visits (C-index=0.63; 95%CI: 0.57-0.68), which was comparable to the predictive performance of patient-reported PHQ-9 and treating-clinician SRA. Correlations between LLM CGI-S and patient-reported PHQ-9 were consistent across clinics (I2<0.1) but significantly lower among Black (r=0.48, 95%CI: 0.38-0.57) and Hispanic (r=0.43, 95%CI: 0.27-0.56) patients. ConclusionsLLM-inferred depression severity scores from psychiatric outpatient notes support longitudinal, standardized phenotyping of depression severity, such as for routine outcome monitoring. These results have implications for facilitating genetic, pharmacoepidemiologic, and antidepressant treatment effectiveness studies using real-world evidence.
Rohde, C.; Ostergaard, S. D.
Show abstract
ObjectivesElectroconvulsive Therapy (ECT) is an effective treatment for bipolar disorder, particularly in severe acute cases or for illness resistant to pharmacotherapy. However, the risk of relapse following ECT is high, necessitating intervention to reduce this risk. Based on findings from ECT studies in unipolar depression and its well-known mood-stabilizing properties, it is likely that lithium treatment may reduce the risk of relapse of bipolar disorder following ECT. Therefore, we conducted a target trial emulation using data from Danish nationwide registers to investigate whether lithium protects against relapse following ECT treatment of bipolar disorder. MethodsPatients discharged from their first psychiatric admission with a primary diagnosis of bipolar disorder between January 1, 2006, and June 1, 2024, who received at least six ECT treatments, were included. Follow-up began two weeks after discharge and continued until relapse, death, one year, or January 1, 2025. Patients were considered allocated to lithium treatment if they redeemed a prescription for lithium within the first two weeks after discharge from the index admission (ECT treatment). The outcome was time to relapse, defined by either psychiatric hospital admission or suicide. Cox proportional hazards regression, adjusted for potential confounders, was used to compare the outcome between patients allocated and not allocated to lithium treatment. ResultsAmong the 574 eligible patients (mean age 41.5 years, 61.3% women), 214 (37.3%) were allocated to lithium treatment and 360 (62.7%) were not allocated to lithium treatment. During follow-up, 56 patients (26.2%) in the lithium group and 135 patients (37.5%) in the non-lithium group experienced a relapse. Lithium treatment was associated with a substantially reduced risk of relapse (adjusted hazard rate ratio, 0.60, 95% CI=0.43-0.84). ConclusionLithium treatment after ECT may reduce the risk of relapse in patients with bipolar disorder. These findings should be followed up by a randomized controlled trial.
Whitfield, J.; Goh, A.
Show abstract
BackgroundAI-powered cognitive behavioural therapy (AI-CBT) tools hold significant promise for addressing the global mental health treatment gap, yet sustained user engagement remains critically low. While patient attitudes and experiential factors have been qualitatively documented, the psychological mechanisms through which AI literacy translates into long-term engagement remain poorly understood. Existing systematic evidence highlights trust, perceived therapeutic alliance, and stigma as salient themes, but no large-scale quantitative study has modelled these as a mediated pathway. ObjectiveThis study aimed to (1) examine whether trust in AI systems and perceived therapeutic alliance mediate the relationship between AI literacy and sustained AI-CBT engagement, and (2) determine whether mental health stigma moderates these mediated pathways. MethodsA cross-sectional national online survey was conducted in the United Kingdom (N = 1,247). Eligible adults (18+) with a history of anxiety or depression who had used an AI-CBT tool in the preceding 12 months were recruited via stratified random sampling. Structural equation modelling (SEM) with moderated mediation was conducted in R (lavaan 0.6-17). Moderated mediation was evaluated using the PROCESS macro framework adapted for SEM, with 5,000 bootstrap replications for bias-corrected confidence intervals. Model fit was assessed using CFI, TLI, RMSEA, and SRMR indices. ResultsThe final SEM demonstrated excellent fit (CFI = 0.967, TLI = 0.959, RMSEA = 0.043 [90% CI: 0.036-0.051], SRMR = 0.052). AI literacy exerted a significant indirect effect on sustained engagement through trust in AI ({beta} = 0.213, SE = 0.031, p < .001) and perceived therapeutic alliance ({beta} = 0.187, SE = 0.028, p < .001). Mental health stigma significantly moderated the trust[->]engagement pathway ({Delta}R2 = 0.042, p = .003), with the indirect effect being stronger among individuals with lower stigma scores. The total indirect effect accounted for 58.4% of the total effect of AI literacy on engagement. ConclusionsAI literacy promotes sustained AI-CBT engagement primarily through its effects on trust and perceived therapeutic alliance, pathways that are attenuated by mental health stigma. These findings underscore the need for stigma-reduction interventions and AI literacy programmes as implementation strategies. Findings have direct implications for the design and deployment of AI-CBT tools across UK NHS digital mental health services.
Bartal, A.; Allouche-Kam, H.; Elhasid Felsenstein, T.; Dassopoulos, E. C.; Lee, M.; Edlow, A. G.; Orr, S. P.; Dekel, S.
Show abstract
ObjectivePosttraumatic stress disorder (PTSD) after a traumatic birth is a serious but overlooked maternal morbidity, affecting [~]20% of women following medically complicated deliveries. PTSD can undermine maternal caregiving. Rapid screening tools suited to busy obstetric settings are lacking. We developed and evaluated a brief screener, derived from the 20-item PTSD Checklist for DSM-5 (PCL-5), to identify PTSD related to childbirth. Study DesignWe enrolled 107 women with traumatic childbirth. Participants completed the PCL-5 and the gold-standard clinician diagnostic interview for PTSD (CAPS-5); depression was measured with the Edinburgh Postnatal Depression Scale (EPDS). Bootstrap resampling with LASSO regression identified PCL-5 items most associated with PTSD. Firth logistic regression models estimated diagnostic accuracy. Sensitivity, specificity, area under the ROC curve (AUC), and Youdens J statistic determined performance and optimal cut-off. ResultsA six-item version of the PCL-5 (PCL-5 R6), statistically derived from the full scale, showed excellent discrimination for PTSD compared with clinician evaluation (AUC = 0.95; 95% CI, 0.89-1.00). A cut-off score of 7 yielded high sensitivity (0.96) and good specificity (0.83), with an overall diagnostic efficiency of 0.86, detecting most PTSD cases while minimizing false positives. The PCL-5 R6 correlated moderately with the EPDS (rho = 0.53), showing that a depression screen alone cannot reliably detect PTSD. ConclusionsA short, 6-item PCL-5 provides a valid, efficient tool for detecting childbirth PTSD. Its brevity and accuracy make it practical for integration into routine postpartum care, enabling timely mental health screening.
Jin, K. W.; Rostam-Abadi, Y.; Chaudhary, P.; Garrett, M. A.; Huang, A. S.; Montelongo, M.; Nagpal, C.; Shei, J.; Weathers, J.; Zhang, J. S.; Chen, Q.; Kim, J.; Malgaroli, M.; Mathis, W. S.; Rodriguez, C. I.; Selek, S.; Sharma, M. S.; Pittenger, C.; Yip, S. W.; Zaboski, B. A.; Xu, H.
Show abstract
ImportanceLarge language models (LLMs) have demonstrated diagnostic potential in several medical specialties, but their application to psychiatry - where diagnosis relies heavily on clinical judgment, narrative interpretation, and reasoning under uncertainty - remains insufficiently evaluated. ObjectiveTo evaluate diagnostic accuracy and clinician-judged reasoning quality of multiple large language models using psychiatric case vignettes. DesignMixed-methods evaluation study of diagnostic accuracy across four LLMs using 196 psychiatric case vignettes (135 published and 61 novel). Clinical reasoning quality was evaluated on a randomly selected subset of 30 vignettes using structured clinician ratings along two reasoning dimensions. The highest-performing model was illustratively compared with psychiatry trainees on the same subset. Diagnostic correctness for the full vignette set was assessed by a separate adjudicator LLM. SettingPublicly available model interfaces, December 2025. ParticipantsFive board-certified psychiatrists evaluated model-generated clinical reasoning. Two psychiatry residents served as the illustrative human comparison. Main Outcomes and MeasuresDiagnostic accuracy and clinician-rated clinical reasoning quality. Diagnostic accuracy was assessed using top-1 accuracy, top-5 accuracy, recall@5, and mean reciprocal rank based on ranked lists of five differential diagnoses per vignette. Clinical reasoning quality was assessed using two 5-point Likert scales adapted from the American Council of Graduate Medical Education Psychiatry Residency Milestones, evaluating data extraction and diagnostic reasoning. ResultsAcross 196 psychiatric case vignettes, Claude Opus 4.5 (Anthropic) achieved the highest diagnostic accuracy (top-1 accuracy, 0.638; top-5 accuracy, 0.801; recall@5, 0.731; mean reciprocal rank, 0.710) and clinician-rated reasoning scores. Higher clinician-rated diagnostic reasoning quality was strongly associated with diagnostic correctness in mixed-effects logistic regression analyses ({beta} = 1.80; p < 0.001), corresponding to an approximately six-fold increase in odds of a correct diagnosis per 1-point increase in reasoning score. In an illustrative comparison, diagnostic accuracy of Claude Opus 4.5 fell within the range observed for psychiatry trainees. Conclusions and RelevanceLLMs demonstrated high diagnostic accuracy and generated clinical reasoning that clinicians judged to be largely coherent and safe. Diagnostic reasoning quality was more strongly associated with diagnostic correctness than data extraction quality, underscoring the importance of evaluating reasoning alongside accuracy when assessing LLMs for clinical decision support in psychiatry. Key PointsO_ST_ABSQuestionC_ST_ABSCan multiple large language models accurately diagnose psychiatric conditions and generate diagnostic reasoning that clinicians judge as coherent, safe, and clinically meaningful? FindingsAcross 196 psychiatric case vignettes, four large language models demonstrated high diagnostic accuracy. In a clinician-evaluated subset of 30 vignettes, model diagnostic accuracy fell within the range observed for psychiatry residents. Clinicians judged model-generated diagnostic reasoning to be largely coherent and safe. Higher clinician-rated reasoning quality was strongly associated with diagnostic correctness, independent of data extraction quality. MeaningEvaluating diagnostic reasoning, in addition to accuracy, may be important when assessing large language models for potential clinical decision support in psychiatry.
Lim, A.; Pemberton, J.
Show abstract
Background: The NHS Improving Access to Psychological Therapies (IAPT) programme, now rebranded as NHS Talking Therapies, faces persistent capacity constraints with average wait times exceeding 90 days for cognitive behavioral therapy (CBT) in many Clinical Commissioning Group areas. AI-powered CBT platforms have been introduced as a digital adjunct within stepped care, yet longitudinal evidence on anxiety symptom trajectories and their predictors in routine NHS settings remains limited. Objective: To model individual anxiety symptom trajectories among patients referred to an AI-powered CBT platform within NHS primary care, identify distinct trajectory classes, and examine patient-level and practice-level predictors of differential treatment response using multilevel growth curve modeling. Methods: A prospective cohort study was conducted using linked clinical and administrative data from 6,284 patients (aged 18-65) referred to the CalmLogic AI-CBT platform across 187 general practices in four NHS England Integrated Care Systems (ICSs) between April 2023 and September 2025. Patients completed GAD-7 assessments at baseline, 4 weeks, 8 weeks, 12 weeks, and 24 weeks. Three-level growth curve models (assessments nested within patients nested within practices) with random intercepts and random slopes were fitted. Growth mixture modeling (GMM) was subsequently applied to identify latent trajectory classes. Predictors were examined at Level 2 (patient demographics, baseline severity, comorbidities, digital literacy, engagement intensity) and Level 3 (practice deprivation index, list size, urban/rural classification, and IAPT wait time). Results: The unconditional growth model revealed a significant average linear decline in GAD-7 scores of -0.94 points per month (p < .001), with substantial between-patient variation in both intercepts (variance = 14.82, p < .001) and slopes (variance = 0.38, p < .001). Significant between-practice variation accounted for 8.7% of intercept variance (ICC = 0.087). Growth mixture modeling identified four distinct trajectory classes: Rapid Responders (28.4%, steep early decline stabilising by week 8); Gradual Improvers (34.1%, steady linear decline through 24 weeks); Partial Responders (22.8%, modest early improvement followed by a plateau at clinically significant levels); and Non-Responders (14.7%, minimal change or slight deterioration). Higher baseline severity, female gender, and greater module completion predicted membership in the Rapid Responder class. Practice-level IAPT wait times exceeding 90 days independently predicted faster improvement trajectories (coefficient = -0.31, p = .003), suggesting that AI-CBT has its greatest incremental value in capacity-constrained areas. Patients in the most deprived quintile showed slower trajectories (coefficient = 0.22, p = .011) despite equivalent engagement levels, indicating a deprivation-related treatment response gap. Conclusions: AI-powered CBT platforms integrated within NHS primary care produce significant anxiety symptom reduction on average, but treatment response is heterogeneous, with four distinct trajectory classes identified. The finding that longer IAPT wait times predict better AI-CBT outcomes supports the platform's positioning as a scalable bridge intervention for capacity-constrained services. The deprivation-related response gap warrants targeted support strategies for patients in the most disadvantaged communities.
Provaznikova, B.; de Bardeci, M.; Altamiranda, E.; Ip, C.-T.; Monn, A.; Weber, S.; Jungwirth, J.; Rohde, J.; Prinz, S.; Kronenberg, G.; Bruehl, A.; Bracht, T.; Olbrich, S.
Show abstract
Objective: Major depressive episodes frequently show limited response to first-line treatments, motivating the search for objective biomarkers. EEG/ECG-based support tools aggregating electrophysiological predictors may guide treatment selection. We examined whether antidepressant treatments concordant with an EEG/ECG-biomarker report were associated with higher response rates. Methods: We retrospectively analyzed adults with ICD-10 depressive disorder or bipolar depression treated with electroconvulsive therapy (ECT), repetitive transcranial magnetic stimulation (rTMS), (es)ketamine, or selective serotonin reuptake inhibitors (SSRIs) between 2022 and 2024. Resting-state EEG with simultaneous ECG generated individualized biomarker reports with modality-specific response likelihoods. Treatment chosen by clinical teams was classified as concordant or non-concordant; response was derived from routinely collected clinical scales. Results: Among 153 patients (ECT n=53, rTMS n=48, (es)ketamine n=36, SSRIs n=16), response rates were higher for concordant vs non-concordant treatments: ECT 70% vs 50%, rTMS 30% vs 13%, (es)ketamine 31% vs 10%, and SSRIs 100% vs 11%. Overall, 46% (42/92) of concordant vs. 26% (14/54) of non-concordant patients responded (absolute difference +20 percentage points; relative increase {approx}77%; number needed to treat {approx}5). Conclusion: Concordance with EEG/ECG biomarkers correlated with higher treatment response, warranting confirmation in prospective trials. Significance: EEG/ECG-based decision support may enhance antidepressant treatment response in everyday clinical practice.
Reinecke-Tellefsen, C. J.; Orberg, A.; Ostergaard, S. D.
Show abstract
The COVID-19 pandemic had substantial impact on healthcare systems across the globe, including psychiatric services. Use of electroconvulsive therapy (ECT), a lifesaving intervention for severe mental illness, was reported to have declined during the pandemic in several countries, but nationwide data remain scarce. Using nationwide data from the Danish National Patient Register, we examined all ECT treatments administered in Denmark from September 2019 to May 2025. Weekly treatment numbers were visualized across the three national COVID-19 lockdowns to descriptively assess changes in ECT use. A notable reduction in ECT treatments was observed in the weeks preceding and during the first lockdown (March 11 to May 18, 2020). A post-hoc estimation indicated approximately 1,366 "missed" treatments during the initial pandemic phase in 2020. When these were added to the 27,033 treatments delivered in 2020, the adjusted total approximated annual treatment volumes in 2019 and 2022, suggesting a temporary disruption rather than sustained decline. In contrast, ECT activity during the second and third lockdowns appeared largely unaffected. These findings suggest that ECT provision in Denmark was temporarily reduced during the initial phase of the pandemic but remained resilient thereafter. In the case of a future pandemic, safeguarding timely access to ECT--particularly in early phases-- should be prioritized given its critical role in the treatment of severe mental illness.
Wickersham, A.; Soneson, E.; Adamo, N.; Colling, C.; Jewell, A.; Downs, J.
Show abstract
BackgroundA study conducted in Norway showed that the association between pupil mental health diagnoses and educational attainment has weakened over time. One possible explanation is that earlier detection of mental health problems in recent years has facilitated earlier treatment, intervention, and educational support that might improve academic outcomes. We investigated whether the weakening association between mental health and attainment could be replicated in England, and explained by earlier age at first diagnosis. MethodsThis was a secondary longitudinal data analysis of de-identified records from a secondary mental healthcare provider in England, which have been linked to the Department for Educations National Pupil Database. We included n=149,841 pupils residing in South East London, born 1993-2003, who completed their end-of-school exams 2009-2019. The main exposure variables were ADHD and internalising disorder diagnosis. In linear regressions, we investigated their associations with Year 11 attainment (typically assessed age 15-16 years), whether this was modified by birth year, and the role of age at first diagnosis. ResultsOn average, ADHD (n=844, 0.6%) and internalising disorder (n=2,523, 1.7%) were associated with lower Year 11 attainment. However, significant interactions between diagnosis and birth year suggested that pupils with these disorders showed increases in standardised exam scores over successive birth cohorts, resulting in a closing attainment gap over time. While age at first diagnosis became younger over the period, this did not confound the observed associations. ConclusionsWe replicated findings from Norway that suggest a narrowing attainment gap between those with and without ADHD and internalising disorder diagnoses. Building on this, we ruled out earlier age of diagnosis as a possible explanation for this phenomenon. With administrative data research growing internationally, we are increasingly able to replicate mental health and education trends in different countries, opening more opportunities for international collaboration.
Voelker, M. P.; Kresken, A. L.; Foo, J. C. P.; Frank, J.; Reinhard, I.; Klinger-Koenig, J.; Zillich, L.; Ferreira de Sa, D. S.; Mikolajczyk, R.; Leitzmann, M.; Bohmann, P.; Krist, L.; Keil, T.; Meinke-Franze, C.; Riedel-Heller, S. G.; Greiser, H.; Bohn, B.; Brenner, H.; Obi, N.; Harth, V.; Pischon, T.; Grabe, H. J.; Berger, K.; Schwarz, E.; Mata, J.; Witt, S.; Streit, F.
Show abstract
BackgroundChildhood maltreatment is a major risk factor for depression and may contribute to sex differences in depression prevalence. We examined sex-specific associations between childhood maltreatment and depression and estimated the proportion of depression cases attributable to specific maltreatment subtypes. MethodsWe analyzed baseline data from 159,045 participants (49.4% women; aged 19-72) in the German National Cohort (NAKO). Childhood maltreatment was assessed via the Childhood Trauma Screener; depression via self-reported physicians diagnosis and MINI classification (lifetime) and the PHQ-9 (current). Associations, including sex interactions, were modeled using binary logistic regressions. Mediation analyses and sex-stratified population attributable fractions (PAFs) quantified the contribution of maltreatment to depression. ResultsMaltreatment was associated with increased odds of lifetime (ORphysicians diagnosis=2.45 [2.38,2.53]; ORMINI=2.30 [2.18,2.43]) and current depression (OR=2.90 [2.79,3.02]). Sex interactions were observed for the physicians diagnosis: physical abuse and neglect had stronger associations in women (ORphysical abuse=2.74 [2.59,2.90]; ORphysical neglect=1.36 [1.28,1.44]) than men (ORphysical abuse=2.36 [2.21,2.52]; ORphysical neglect=1.08 [1.00,1.16]), whereas sexual abuse showed stronger associations in men (OR=3.23 [2.91,3.57]) than women (OR=2.61 [2.48,2.75]). Overall, childhood maltreatment accounted for 21.2-26.2% of lifetime and 33.4% of current depression. PAFs were higher in women than men for lifetime (24.5-28.5% vs. 16.0-20.9%) and current depression (36.1% vs. 28.2%). Emotional abuse and neglect contributed the highest PAFs (up to 10.2%). Maltreatment mediated 18.9-30.0% of the association between sex and depression. ConclusionMaltreatment, especially emotional subtypes, account for a substantial proportion of depression in both sexes, with stronger overall associations in women. Sex-specific prevention may help reduce depression prevalence.
Desbeaumes Jodoin, V.; Bousseau, E.; Trottier-Duclos, F.; Jutras-Aswad, D.; Lesperance, F.; Nguyen, D. K.; Bou Assi, E.; Blumberger, D. M.; Arns, M.; Bakert, T. E.; Daskalakis, Z.; Lesperance, P.; Miron, J.-P.
Show abstract
BackgroundIntermittent theta burst stimulation (iTBS) and H-coil repetitive transcranial magnetic stimulation (rTMS) are FDA-cleared treatments for major depression; yet their comparative effectiveness in treatment-resistant depression (TRD) has not been evaluated in randomized trials. This pilot randomized trial was designed to obtain preliminary comparative estimates and to explore whether baseline cognitive functioning relates to early remission. MethodsTwenty-eight adults with TRD were randomized to six weeks of iTBS delivered to the dorsolateral prefrontal cortex (DLPFC) using a figure-8 coil (n=15) or H-coil rTMS delivered to the dorsomedial prefrontal cortex (DMPFC) using a H7-coil (n=13). The primary outcome was change in 17-item Hamilton Depression Rating Scale (HRSD-17) score from baseline to week 6, analyzed with ANCOVA. Additional outcomes included response, remission, and symptom trajectories through week 18. Exploratory analyses examined the association between baseline cognitive functioning, such as executive functions and memory, and remission. ResultsTwenty-five participants completed all 30 sessions. Adjusted week-6 HRSD-17 scores did not differ between groups (mean difference -0.40, 95% CI -5.23 to 4.43; p=.865). Response rates were 40.0% for iTBS and 50.0% for H-coil (p>.60), and remission rates were identical across groups (20.0%). Remitters showed higher baseline executive functioning than non-remitters in exploratory analyses, although these associations were not confirmed in adjusted models. ConclusionIn this pilot trial, iTBS and H7-coil rTMS showed symptom improvement, with no clear between-group pattern. Exploratory findings suggest a potential signal involving executive functioning that warrants further investigation. These results inform the feasibility and design of larger comparative trials. Trial registrationClinicalTrials.gov (NCT05902312)
WANG, X. X.
Show abstract
BackgroundSuicide prevention has become a global public health priority, and Brief Contact Interventions (BCI) following suicide attempts (SA) are an important tool for preventing suicides. The VigilanS project was designed to generalize compounded BCIs at the entire population level., It involves resource cards, telephone calls, and mailings, following a predefined algorithm. It has been implemented progressively in France, on a region-by-region basis, since 2015. ObjectiveTo evaluate the effectiveness of VigilanS in reducing suicide attempts among patients aged 18 years and older, and to explore potential differences in effectiveness by sex, age, and geographical location. MethodsThe study used data from the French national hospitalization database, PMSI-MCO. It included all patients over age 18 who were admitted to general hospitals for suicide attempts, between 2012 and 2022. Time-to-event ("survival") analysis of a second SA after a first one was performed; patients whose first SA occurred before VigilanS implementation were compared with their after-VigilanS counterparts. Six regions, with implementation occurring between 2015 and 2017, are analyzed here. ResultsThe differences in distribution of time-to-new-SA among patients before and after VigilanS implementation were statistically significant in all six regions under scope (log-rank test: P<0.0001). The Cox regression analysis revealed that VigilanS was significantly associated with a reduced risk of reattempting suicide in all regions. Age consistently showed a negative association with reattempting suicide. ConclusionVigilanS is likely effective in reducing suicide attempts among patients aged 18 years and older in France. This suggests that implementing BCIs following SAs in general hospitals at a population-wide level can contribute to reducing suicide rates and provides real-world evidence (RWE).